25 research outputs found
Fine-Grained Head Pose Estimation Without Keypoints
Estimating the head pose of a person is a crucial problem that has a large
amount of applications such as aiding in gaze estimation, modeling attention,
fitting 3D models to video and performing face alignment. Traditionally head
pose is computed by estimating some keypoints from the target face and solving
the 2D to 3D correspondence problem with a mean human head model. We argue that
this is a fragile method because it relies entirely on landmark detection
performance, the extraneous head model and an ad-hoc fitting step. We present
an elegant and robust way to determine pose by training a multi-loss
convolutional neural network on 300W-LP, a large synthetically expanded
dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from
image intensities through joint binned pose classification and regression. We
present empirical tests on common in-the-wild pose benchmark datasets which
show state-of-the-art results. Additionally we test our method on a dataset
usually used for pose estimation using depth and start to close the gap with
state-of-the-art depth pose methods. We open-source our training and testing
code as well as release our pre-trained models.Comment: Accepted to Computer Vision and Pattern Recognition Workshops
(CVPRW), 2018 IEEE Conference on. IEEE, 201
Learning to Localize and Align Fine-Grained Actions to Sparse Instructions
Automatic generation of textual video descriptions that are time-aligned with
video content is a long-standing goal in computer vision. The task is
challenging due to the difficulty of bridging the semantic gap between the
visual and natural language domains. This paper addresses the task of
automatically generating an alignment between a set of instructions and a first
person video demonstrating an activity. The sparse descriptions and ambiguity
of written instructions create significant alignment challenges. The key to our
approach is the use of egocentric cues to generate a concise set of action
proposals, which are then matched to recipe steps using object recognition and
computational linguistic techniques. We obtain promising results on both the
Extended GTEA Gaze+ dataset and the Bristol Egocentric Object Interactions
Dataset
Modelling seasonal environmental preferences of tropical tuna purse seine fisheries in the Mozambique Channel
The spatial-temporal environmental preferences and biomass aggregation of tropical tuna from purse seine
fishery in the Mozambique Channel (MZC) have barely been investigated. In this study, tuna biomass volume
from Fish Aggregating Devices (FADs) and Free-Swimming Schools (FSC), collected by Spanish fishing logbooks
during 2003–2013, were modelled separately as a function of a set of oceanographic variables (sea surface
temperature, sea surface height, geostrophic currents, salinity, and chlorophyll-a) using Generalized Additive
Models (GAMs). Temporal variables (natural day, month and year), and spatial variables (latitude and longitude)
were included in the models to account for the spatio-temporal structure of dynamic biomass of tropical tuna
volume gathering. Oceanographic, temporal and spatial effects on aggregated catches differed between fishing
modes, even though some common aspects appeared along the area and the period of study. Fishable patches of
tuna biomass accumulation were explained by sea surface temperature, productivity, sea surface height,
geostrophic currents, and apart from the spatio-temporal variables interactions. Although the models predicted
slight differences for tuna fishing spots preferences, both fishing modes partially overlapped. Goodness of fit for
selected variables showed that models were able to predict tuna catches assembled patterns in the MZC
reasonably well. These results highlight a connection between the biophysical state of the oceans and purse seine
tuna catches in the MZC, and ultimately may contribute to the scientific advice for the appropriate management
and conservation of the exploited resources by purse seine fleets in the area of MZC.Postprint1,58
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
We present , a family of fine-tuned and merged Large
Language Models (LLMs) that achieves the strongest performance and currently
stands at first place in HuggingFace's Open LLM Leaderboard as of the release
date of this work. In this work we describe (1) our curated dataset
, that is a subset of other open datasets and which
(2) our process of fine-tuning and merging
LoRA modules in order to conserve the strong prior of pretrained LLMs, while
bringing specific domain knowledge to the surface (3) our efforts in checking
for test data leaks and contamination in the training data, which can inform
future research. Specifically, the Platypus family achieves strong performance
in quantitative LLM metrics across model sizes, topping the global Open LLM
leaderboard while using just a fraction of the fine-tuning data and overall
compute that are required for other state-of-the-art fine-tuned LLMs. In
particular, a 13B Platypus model can be trained on A100 GPU
using 25k questions in 5 hours. This is a testament of the quality of our
Open-Platypus dataset, and opens opportunities for more improvements in the
field. Project page: https://platypus-llm.github.i